Llama ("Large Language Model Meta AI" serving as a backronym) is a family of large language models (LLMs) released by Meta AI starting in February 2023.
Llama models come in different sizes, ranging from 1 billion to 2 trillion parameters. Initially only a foundation model, starting with Llama 2, Meta AI released instruction fine-tuned versions alongside foundation models.
Model weights for the first version of Llama were only available to researchers on a case-by-case basis, under a non-commercial license. Unauthorized copies of the first model were shared via BitTorrent. Subsequent versions of Llama were made accessible outside academia and released under licenses that permitted some commercial use.
Alongside the release of Llama 3, Meta rolled out Meta AI, an AI assistant built on Llama. Meta AI has a dedicated website and is available on Facebook and WhatsApp. The latest version is Llama 4, released in April 2025.
Compared with other responses to ChatGPT, Meta's Chief AI scientist Yann LeCun stated that large language models are best for aiding with writing.
Llama was trained on only publicly available information, and was trained at various model sizes, with the intention to make it more accessible to different hardware. The model was exclusively a foundation model, although the paper contained examples of instruction fine-tuned versions of the model.
Meta AI reported the 13B parameter model performance on most NLP benchmarks exceeded that of the much larger GPT-3 (with 175B parameters), and the largest 65B model was competitive with state of the art models such as PaLM and Chinchilla AI.
Reactions to the leak varied. Some speculated that the model would be used for malicious purposes, such as more sophisticated spamming. Some have celebrated the model's accessibility, as well as the fact that smaller versions of the model can be run relatively cheaply, suggesting that this will promote the flourishing of additional research developments. Multiple commentators, such as Simon Willison, compared Llama to Stable Diffusion, a text-to-image model which, unlike comparably sophisticated models which preceded it, was openly distributed, leading to a rapid proliferation of associated tools, techniques, and software.
Llama 2 includes foundation models and models fine-tuned for chat. In a further departure from the original version of Llama, all models are released with weights and may be used for many commercial use cases. Because Llama's license enforces an acceptable use policy that prohibits Llama from being used for some purposes, it is not open source. Meta's use of the term open-source to describe Llama has been disputed by the Open Source Initiative (which maintains The Open Source Definition) and others.
Code Llama is a fine-tune of Llama 2 with code specific datasets. 7B, 13B, and 34B versions were released on August 24, 2023, with a 70B version released on January 29, 2024. Starting with the foundation models from Llama 2, Meta AI would train an additional 500B tokens of code datasets, before an additional 20B token of long-context data, creating the Code Llama foundation models. This foundation model was further trained on 5B instruction following token to create the instruct fine-tune. Another foundation model was created for Python code, which trained on 100B tokens of Python-only code, before the long-context data.
Regarding scaling laws, Llama 3 models empirically showed that when a model is trained on data that is more than the "Chinchilla-optimal" amount, the performance continues to scale log-linearly. For example, the Chinchilla-optimal dataset for Llama 3 8B is 200 billion tokens, but performance continued to scale log-linearly to the 75-times larger dataset of 15 trillion tokens.
During an interview with Dwarkesh Patel, Mark Zuckerberg said that the 8B version of Llama 3 was nearly as powerful as the largest Llama 2. Compared to previous models, Zuckerberg stated the team was surprised that the 70B model was still learning even at the end of the 15T tokens training. The decision was made to end training to focus GPU power elsewhere.
Llama 3.1 was released on July 23, 2024, with three sizes: 8B, 70B, and 405B parameters.
The training data included publicly available data, licensed data, and Meta-proprietary data such as publicly shared posts from Instagram and Facebook and people’s interactions with Meta AI. The knowledge cutoff was August 2024.
Meta claimed in its release announcement that Llama 4 bested GPT-4o's score on the LMArena AI benchmark. The company also stated that Llama 4's benchmark score was achieved using an unreleased "experimental chat version" of the model that was "optimized for conversationality", which differed from the version of Llama 4 released to the public. LMArena indicated that it would change its policies to prevent this incident from reoccurring, and responded, "Meta's interpretation of our policy did not match what we expect from model providers. Meta should have made it clearer that 'Llama-4-Maverick-03-26-Experimental' was a customized model to optimize for human preference." Some users criticized Meta on social media for its use of a separate model version tailored for benchmarking, and some additionally accused Meta of training Llama 4 on to further boost its benchmark scores, which Meta denied.
The following table lists the main model versions of Llama, describing the significant changes included with each version:
Llama 1 foundational models were trained on a data set with 1.4 trillion tokens, drawn from publicly available data sources, including:
In April 2023, Together AI launched a project named RedPajama to reproduce and distribute an open-source version of the Llama dataset, initially containing approximately 1.2 trillion tokens.
Llama 2 foundational models were trained on a data set with 2 trillion tokens. This data set was curated to remove Web sites that often disclose personal data of people. It also upsamples sources considered trustworthy. Llama 2 - Chat was additionally fine-tuned on 27,540 prompt-response pairs created for this project, which performed better than larger but lower-quality third-party datasets. For AI alignment, reinforcement learning with human feedback (RLHF) was used with a combination of 1,418,091 Meta examples and seven smaller datasets. The average dialog depth was 3.9 in the Meta examples, 3.0 for Anthropic Helpful and Anthropic Harmless sets, and 1.0 for five other sets, including OpenAI Summarize, StackExchange, etc.
Llama 3 consists of mainly English data, with over 5% in over 30 other languages. Its dataset was filtered by a text-quality classifier, and the classifier was trained by text synthesized by Llama 2.
In a lawsuit brought by Richard Kadrey and others against Meta Platforms, CEO Mark Zuckerberg was alleged to have authorized the use of copyrighted content from Library Genesis to train Llama AI models and conceal its actions by removing copyright markers from the data.
For AI alignment, human annotators wrote prompts and then compared two model outputs (a binary protocol), giving confidence levels and separate safety labels with veto power. Two separate reward models were trained from these preferences for safety and helpfulness using reinforcement learning from human feedback (RLHF). A major technical contribution is the departure from the exclusive use of proximal policy optimization (PPO) for RLHF – a new technique based on rejection sampling was used, followed by PPO.
Multi-turn consistency in dialogs was targeted for improvement, to make sure that "system messages" (initial instructions, such as "speak in French" and "act like Napoleon") are respected during the dialog. This was accomplished using the new "Ghost attention" technique during training, which concatenates relevant instructions to each new user message but zeros out the loss function for tokens in the prompt (earlier parts of the dialog).
Meditron is a family of Llama-based finetuned on a corpus of clinical guidelines, PubMed papers, and articles. It was created by researchers at École Polytechnique Fédérale de Lausanne School of Computer and Communication Sciences, and the Yale School of Medicine. It shows increased performance on medical-related benchmarks such as MedQA and MedMCQA.
Zoom used Meta Llama 2 to create an AI Companion that can summarize meetings, provide helpful presentation tips, and assist with message responses. This AI Companion is powered by multiple models, including Meta Llama 2.
Reuters reported in 2024 that many Chinese foundation models relied on Llama models for their training.
llamafile created by Justine Tunney is an open-source tool that bundles llama.cpp with the model into a single executable file. Tunney et al. introduced new optimized matrix multiplication kernels for x86 and ARM CPUs, improving prompt evaluation performance for FP16 and 8-bit quantized data types.
Since the release of Llama 2, Meta has presented Llama as open-source, a description opposed by the Open Source Initiative (OSI) and some academics and journalists. The OSI stated that Llama's licenses do not meet several provisions of its policy document The Open Source Definition (OSD), which prohibits open-source software licenses from discriminating against "persons or groups" and "fields of endeavor", and accused Meta of openwashing Llama. According to the OSI, Llama 2's license prevented the software from being used commercially in some cases and restricted use in fields including controlled substances and critical infrastructure, while later versions of Llama's license also disallowed use by any individual in the European Union. The OSI published The Open Source AI Definition in October 2024, which requires open-source AI to be released with details about its training data that Meta does not disclose for Llama. A Meta spokesperson responded to The Verge that the company disagrees with this definition. The Free Software Foundation classified Llama 3.1's license as a nonfree software license in January 2025, criticizing its acceptable use policy, restrictions against users with popular applications, and enforcement of trade regulations outside the user's jurisdiction.
In its coverage of Llama 2, Ars Technica initially echoed Meta's use of the term open-source, but later revised its reporting to describe Llama as "source-available", "openly licensed", and "weights available" after the publication recognized that Llama 2's license disallowed entities with over 700 million daily active users from using the LLM and disallowed the LLM's outputs from being used to improve other LLMs. In July 2023, Radboud University researchers scored Llama 2 with the second-lowest "openness" ranking in a comparison of 20 LLMs, with ChatGPT being assigned the lowest ranking. One of the researchers, Mark Dingemanse, criticized Meta's use of the term open-source for Llama 2 as "positively misleading", because "There is no source to be seen, the training data is entirely undocumented, and beyond the glossy charts the technical documentation is really rather poor." CIO, in November 2024, stated that Llama was not open-source due to its acceptable use policy, a 630-word document that "puts it at odds with the broader open-source movement". Later that month, a Nature article asserted that describing Llama 3 as "open" is a case of "'openwashing' systems that are better understood as closed", as Llama 3 provides "little more than an API or the ability to download a model subject to distinctly non-open use restrictions".
The response to Meta's integration of Llama into Facebook was mixed, with some users confused after Meta AI told a parental group that it had a child.
The release of Llama models has sparked significant debates on the benefits and misuse risks of open-weight models. Such models can be fine-tuned to remove safeguards, notably by cyber criminals, until they comply with harmful requests. Some experts contend that future models may facilitate causing damage more than defending against it, for example by making it relatively easy to engineer advanced bioweapons without specialized knowledge. Conversely, open-weight models can be useful for a wide variety of purposes, including for safety research.
Open Source Initiative head Stefano Maffulli criticized Meta for describing Llama as open-source, saying that it was causing confusion among users and "polluting" the term.
Llama 2
Llama 3
Llama 4
The Behemoth model was also announced, but was not released. Meta claimed it was a 288 billion active parameter model with 16 experts and around 2T parameters in total; it was still in training when Scout and Maverick were released. Maverick was codistilled from Behemoth, while Scout was trained from scratch.
Comparison of models
Architecture and training
Architecture
+ Key hyperparameters of Llama 3.1
!
! 8B
! 70B
! 405B Layers 32 80 126 Model dimension 4,096 8,192 16,384 FFN dimension 14,336 28,672 53,248 Attention heads 32 64 128 Key/value heads 8 8 8 Peak learning rate 3 × 10−4 1.5 × 10−4 0.8 × 10−4 Activation function SwiGLU Vocabulary size 128,000 Positional embeddings
Training datasets
Fine-tuning
Applications
llama.cpp
Space
Military
Licensing
Reception
See also
Notes
External links
|
|